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Abstract 

We consider linear-programming (LP) decoding of low-density parity-check (LDPC) codes. While it is clear that 
one can use any general-purpose LP solver to solve the LP that appears in the decoding problem, we argue in this 
paper that the LP at hand is equipped with a lot of structure that one should take advantage of. Towards this goal, 
we study the dual LP and show how coordinate-ascent methods lead to very simple update rules that are tightly 
connected to the min-sum algorithm. Moreover, replacing minima in the formula of the dual LP with soft-minima 
one obtains update rules that are tightly connected to the sum-product algorithm. This shows that LP solvers with 
complexity similar to the min-sum algorithm and the sum-product algorithm are feasible. Finally, we also discuss 
some sub-gradient-based methods. 



1 Introduction 

Linear-programming (LP) decoding [1], [2] has recently 
emerged as an interesting option for decoding low- 
density parity-check (LDPC) codes. Indeed, the obser- 
vations in [3], [4], [5] suggest that the LP decoding 
performance is very close to the message-passing it- 
erative (MPI) decoding performance. Of course, one 
can use any general-purpose LP solver to solve the 
LP that appears in LP decoding, however in this paper 
we will argue that one should take advantage of the 
special structure of the LP at hand in order to formulate 
efficient algorithms that provably find the optimum of 
the LP. 

Feldman et al. [6] briefly mention the use of sub- 
gradient methods for solving the LP of an early ver- 
sion of the LP decoder (namely for turbo-like codes). 
Moreover, Yang et al. [7] present a variety of interesting 
approaches to solve the LP where they use some of the 
special features of the LP at hand. However, we belive 
that one can take much more advantage of the structure 
that is present: this paper shows some results in that 
direction. 

So far, MPI decoding has been successfully used in 
applications where block error rates on the order of 
10~ 5 are needed because for these block error rates 
the performance of MPI decoding can be guaranteed by 
simulation results. However, for applications like mag- 
netic recording, where one desires to have block error 
rates on the order of 10~ 15 and less, it is very difficult 
to guarantee that MPI decoding achieves such low block 
error rates for a given signal-to-noise ratio. The problem 
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is that simulations are too time-consuming and that the 
known analytical results are not strong enough. Our 
hope and main motivation for the present work is that 
efficient LP decoders, together with analytical results 
on LP decoding (see e.g. [8], [9], [10]), can show that 
efficient decoders exist for which low block error rates 
can be guaranteed for a certain signal-to-noise ratio. 

This paper is structured as follows. We start off by 
introducing in Sec. |2] the primal LP that appears in 
LP decoding. In Sec. [5] we formulate the dual LP and 
in Sees. |4] and [5] we consider a "softened" version of 
this dual LP. Then, in Sees. [6] and we propose some 
efficient decoding algorithms and in Sec. [8] we show 
some simulation results. Finally, in Sec. [9] we offer 
some conclusions and in the appendix we present the 
proofs and some additional material. 

Before going to the main part of the paper, let us fix 
some notation. We let R, K + , and R++ be the set of real 
numbers, the set of non-negative real numbers, and the 
set of positive real numbers, respectively. Moreover, we 
will use the canonical embedding of the set F2 = {0, 1} 
into R. The convex hull of a set A C R™ is denoted 
by conv(A). If A is a subset of then conv(„4) 
denotes the convex hull of the set A after A has been 
canonically embedded in R". The i-th component of 
a vector x will be called [x]j and the element in the 
j-th row and i-th column of a matrix A will be called 

Moreover, we will use Iverson's convention, i.e. for 
a statement A we have [A] = 1 if A is true and [A] = 
otherwise. From this we also derive the notation [A] = 
- log[A], i.e. [A] = if A is true and [A] = +00 
otherwise. Let A and X be some arbitrary sets fulfilling 
A C X. A function like X — > K + : x h-> [x e A] is 
called an indicator function for the set A, whereas a 
function like X — > K + : x 1— » [x G A\ is called a 
neglog indicator function for the set A. Of course, this 



Fig. I. Representative part of the FFG for the augmented cost 
function in Q. (Note that this FFG has an additively written global 
function.) 



Fig. 2. Representative part of the FFG for the augmented cost 
function in |2|. Function nodes with a tilde sign in them mean the 
following: if such a function node is connected to edges u and v 
then the function value is — \u = — u], (Note that this FFG has an 
additively written global function.) 



second function can also be considered as a cost or 
penalty function. 

Throughout the paper, we will consider a binary 
linear code C that is defined by a parity-check matrix 
H of size m by n. Based on H, we define the sets 
1 = J(H) = {l,...,n}, J = J(U) 4 {1, ... , m }, 
Tj = Jj(H) ^ {iei \ pF% i = 1} for each j e J, 
Ji = Ji(H) 4 {j e J | [H]^ = 1} for each i e T, 
and £ = £(H) = elx J | j elj £ J,} = 

E T x J \ j e J,i E Tj}. Moreover, for each 
j £ J we define the codes Cj = Cj(H) = {x S 
F2 I hjX 1 " = (mod 2)}, where hj is the j-th row of 
H. Note that the code Cj is a code of length n where 
all positions not in Tj are unconstrained. 

We will express the linear programs in this pa- 
per in the framework of Forney-style factor graphs 
(FFG) [11], [12], [13], sometimes also called normal 
graphs. For completeness we state their formal def- 
inition. An FFG is a graph G(V,E) with vertex set 
V and edge set E. To each edge e in the graph we 
associate a variable x e defined over a suitably chosen 
alphabet X e . Let w be a node in the FFG and let E v 
be the set of edges incident to v. Any node v in the 
graph is associated with a function /„ with domain 
X ei x X e2 x • ■ ■ x X et where {ei, e 2 , ...,et} = E v . 
The co-domain of f v is typically R or M + . 

FFGs typically come in two flavors, either repre- 
senting the factorization of a function into a product 
of terms or a decomposition of an additive cost func- 
tion. In our case we will exclusively deal with the 
latter case. The global function g(x ei , x e2 , . . . , x e , B , ) 
represented by an FFG is then given by the sum 
g(x ei , x £2 , . . . , m e | B | ) = X/'uey fv- 



2 The Primal Linear Program 

The code C is used for data transmission over a 
binary-input memoryless channel with channel law 
-Pv|x(y|x) = Uiei P Y\x(yi\xi)- Upon observing 
Y = y, the maximum-likelihood decoding (MLD) rule 
decides for x(y) = argmax xe c -fV|x(y| x )- This can 
also be written as 



MLD1: maximize -PY|x(y| x ) 
subject to x G C. 



It is clear that instead of -PY|x(y| x ) we can a l so 
maximize logP Y |x(y|x) = £ ieI \ogP Y \x{yi\xi). 
Introducing A 4 4 A,(y,) 4 l og (^g)), % e 

1, and noting that \ogP Y \ x {Vi\xi) = -A l x l + 
logPy| X (yi|0),MLDl can then be rewritten to read 



MLD2: minimize ^i x i 
iei 

subject to xeC. 



Because the cost function is linear, and a linear function 
attains its minimum at the extremal points of a convex 
set, this is essentially equivalent to 



MLD3: minimize ^i x i 
iei 

subject to x G conv(C). 



Although this is a linear program, it can usually not be 
solved efficiently because its description complexity is 



usually exponential in the block length of the code. 

However, one might try to solve a relaxation of 
MLD3. Noting that conv(C) C conv(Ci) fl • • • fl 
conv(C m ) (which follows from the fact that C = 
C% fl • • • fl C m ), Feldman, Wainwright, and Karger [1], 
[2] defined the (primal) linear programming decoder 
(PLPD) to be given by the solution of the linear 
program 



PLPD1: minimize A; 



subject to x e conv(Cj) (j S J). 



The inequalities that are implied by the expression 
x e conv(Cj) can be found in [1], [2], [3], [4]. 
Although PLPD1 is usually suboptimal compared to 
MLD, it is especially attractive for LDPC codes for 
two reasons: firstly, for these codes the description 
complexities of conv(Cj), j € J, turn out to be 
low [2], [4] and, secondly, the relaxation is relatively 
benign only if the weight of the parity checks is low. 
There are many ways of reformulating this PLPD1 rule 
by introducing auxiliary variables: one way that we 
found particularly useful is shown as PLPD2 below. 
The reason for its usefulness is that there is a one- 
to-one correspondence between parts of the program 
and the FFG shown in Fig. [2 as we will discuss 
later on. Indeed, while the notation may seem heavy 
at first glance, it precisely reflects the structure of the 
constraints that are summarily folded into the seemingly 
simpler constraint x G conv(Cj) (j <E J) of PLPD1. 



PLPD2: 
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Here we used the following codes, variables and vec- 
tors. The code Ai C {0, l}lt°> u ^l, % e X, is the set 
containing the all-zeros vector and the all-ones vector 
of length \ Ji\ + 1, and Bj C {0, 1}^, j G J, is the 



code Cj shortened at the positions X \ Xj. 1 For i e X 
we will also use the vectors where the entries are 
indexed by {0} U J L and denoted by Ui j = [ui]j, and 
for j E J we will use the vectors Vj where the entries 
are indexed by X, and denoted by Vjj — {vj]%. Later 
on, we will use a similar notation for the entries of 
and bj, i.e. we will use cijj — [sn]j and bjj = \bj]u 
respectively. 

The above optimization problem is elegantly rep- 
resented by the FFG shown in Fig. ^ I n order to 
express the LP itself in an FFG we have to express the 
constraints as additive cost terms. This is easily accom- 
plished by assigning the cost +oo to any configuration 
of variables that does not satisfy the LP constraints. 
The above minimization problem is then equivalent to 
the (unconstrained) minimization of the augmented cost 
function 

+ ^^(u l ) + ^S J (v J ), (1) 

where for all i £ X and all j £ J , respectively, we 
introduced 



Mm) 4 



+ £ [« i>ai > o] 

aiG^ti 



+ E lb** ^ °1 + E ^ = 1 

i) • (■; ii (••' 

With this, the global function of the FFG in Fig. \l\ 
equals the augmented cost function in Q and we have 
represented the LP in terms of an FFG. 2 

Of course, any reader who is familiar with LDPC 
codes will have no problem to make a connection be- 
tween the FFG of Fig. 1 and the standard representation 
as a Tanner graph. Indeed, a node Ai corresponds to 
a variable node in a Tanner graph and a node Bj 
takes over the role of a parity check node. However, 
instead of simply assigning a variable to node Ai we 
assign a local set of constraints corresponding to the 

'For the codes C under consideration this means that Bj contains 
all vectors of length \ of even parity. 

2 Note that instead of drawing function nodes for the terms that 
appear in the definition of Ai(\ii) and an edge for the variables 
{_ a i,a.i }a.i£Ai > we preferred to simply draw a box for A4, i 6 I. A 
similar comment applies to Bj, j G J. An alternative approach 
would have been to apply the concept of "closing the box" by 
Loeliger, cf. e.g. [13], where ylj(u*) would be defined as the 
minimum over {a^aja of the above i4^(iij) function. Here 
we preferred the first approach because we wanted to keep variables 
like uj and cti a at the "same level". 



convex hull of a repetition code. These are the equations 



1. 



Similarly, the equations for the convex hull of a simple 
parity-check code can be identified for nodes Bj. 



3 The Dual Linear Program 

The dual linear program [14] of PLPD2 is 



DLPD2: 
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Expressing the constraints as additive cost terms, the 
above maximization problem is equivalent to the (un- 
constrained) maximization of the augmented cost func- 
tion 



r 



E s K v i) - E K 

ieJ (i,j')e£ 
: -^]-E^ = A ^' 



(2) 



with 



b': < min { 



% < min (-v',b,) 



The augmented cost function in (0 is represented 
by the FFG in Fig. |2] 3 (For deriving DLPD2 we 
used the techniques introduced in [15], [16]; note 
that the techniques presented there can also be used 
to systematically derive the dual function of much 
more complicated functions that are sums of convex 
functions. Alternatively, one might also use results from 
monotropic programming, cf. e.g. [17].) 

Because for each i 6 I the variable <^ is involved 
in only one inequality, the optimal solution does not 
change if we replace the corresponding inequality signs 
by equality signs in DLPD2. The same comment holds 
for all 6K, j G J. 

Definition 1: Let A = Ai X • • • X A„ and let B = 
Bi x • • • x B m . For a = (ai , . . . , a„) G A and b = 

3 A similar comment applies here as in Footnote|2| Here, the <j/. and 
6j have to be seen as dual variables that would appear as edges in a 
more detailed drawing of the boxes A^(u^) and B 1 - (v' ), respectively. 



(bi, . . . , b m ) G B define 

fU( u ') - E( -u i' ai ) 

iez 



E ";•'>, • 



where, with a slight abuse of notation, is such that 
= u'i j for all e 8. Moreover, we call 

(a, b) G AxB consistent if = bjj for all G £■ 

□ 

Obviously, g ab (u') is a linear function in u'. With 
the above definition, DLPD2 can be rewritten to read 



DLPD3: 

max. .9a,b( u ') 
subj. to (a, b) e A x B, 

u' i0 = -Xi (ieT). 



Lemma 2: Let u' be such that u' i0 = — Ai, i G X. If 
(a, b) G Ax B is consistent then g' a b (u') is constant in 
u'. Moreover, g' ah (u') = (A, x), where x is such that 
Xi = ai,o, i e T. If (a, b) e A X B is not consistent 
then g' a b (u') is not a constant function for at least one 



j, (i,j) G £. 



□ 



Proo/- See Sec. 

4 A Softened Dual Linear Program 



For any n G we define the soft-minimum operator 
to be 



• M A 
miir ' Zi — 



-Ilog^e— ) 



(Note that k can be given the interpretation of an inverse 
temperature.) One can easily check that min/"* 1 zg < 
mmg{zg} with equality in the limit n — » +oo. Re- 
placing the minimum operators in DLPD2 by soft- 
minimum operators, we obtain the modified optimiza- 
tion problem 



SDLPD2: 
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In the following, unless noted otherwise, we will set 
K% = K, i e I, and Kj = k, j € for some k G 
It is clear that in the limit n — > +oo we recover 
DLPD2. 



5 A Comment on the Dual of the 
Softened Dual Linear Program 

Let 

H(0Li) = - E a ',a, l0g(ai,aj 

be the entropy of of a random variable whose pmf takes 
on the values {a^aila^-Ai- Similarly, let 

= - E /w°g(/w- 

The dual of SDLDP2 can then be written as 



DSLPD2: 




min. 


y^XiXt 'S^H(a i ) - 

iex iex 


- - K E m) 


subj. to 


the same constraints as in 


PLPD2. 


We note that this is very close to the following Bethe 


free energ 


y optimization problem, cf. 


e.g. [18] 


BFE1: 






min. 


^ ' XiXi 
iex 






+ -E(i^i- 1 ) i ^)- 

iex 


-~E ff (&) 


subj. to 


the same constraints as in 


PLPD2, 


which, in turn, can also be written as 


BFE2: 






min. 


E X ^ - 1 E H ( a *) 

+ -Y / \Ji\H(a i ) 

iex 


-E^i) 


subj. to 


the same constraints as in PLPD2. 



Without going into the details we note that the term 
+ k X)iex(l^i| ~ l)H( a i) is responsible for the fact 
that the cost function in BFE2 is usually non-convex 
for FFGs with cycles. 



6 Decoding Algorithm 1 

In the following, we assume that j and v'j i are 
"coupled", i.e. we always have j = —v'j i for all 
(i,j)€S. 

The first algorithm that we propose is a coordinate- 
ascent-type algorithm for solving SDLPD2. The main 



idea is to select edges G £ according to some 

update schedule: for each selected edge £ £ we 

then replace the old values of •, <f/ it and 6j by new 
values such that the dual cost function is increased (or 
at least not decreased). Practically, this means that we 
have to find an u', t a such that h! {v! i •) > h! (u\ •), where 

/i'K,)= min (K) (-u^ a! }+ min (K) (-v',b,). 

A simple way to achieve this is by setting 

u- = argmax/i'(uj ■). (3) 
' J u'. . 

The variables ^ and 9j are then updated accordingly 
so that we obtain a new (dual) feasible point. 
Lemma 3: The value of a in (|3j is given by 

u i,j = 2 y (^<,o — ~~ C^i,o — ' 
where 

Here the vectors u and a are the vectors u and a, 
respectively, where the j-th position has been omit- 
ted. Similarly, the vectors v and b are the vectors 
v and b, respectively, where the z-th position has 
been omitted. Note that the differences $• — $[ 1 and 
T/ —T' ix , which are required for computing can 
be obtained very efficiently by using the sum-product 
algorithm [11]. 

Proof: See Sec. Q2 □ 
In the introduction we wrote that we would like to 
use the special structure of the primal/dual LP at hand; 
Lemma|3]is a first example how this can be done. Please 
note that when computing the necessary quantities (for 
the case k = 1) one has do computations that are (up to 
some flipped signs) equivalent to computations that are 
done during message updates while performing sum- 
product algorithm decoding of the LDPC code at hand. 

Lemma 4: Assume that all the rows of the parity- 
check matrix H of the code C have Hamming weight at 
least 3. 4 Then, updating cyclically all edges G £, 
the above coordinate-ascent algorithm converges to the 
maximum of SDLPD2. 

Proof: See Sec. O □ 
As we mentioned in the proof of Lemma 0] the 
above algorithm can be seen as a Gauss-Seidel-type 

4 Note that any interesting code has a parity-check matrix whose 
rows have Hamming weight at least 3. 



algorithm. Let us remark that there are ways to see sum- 
product algorithm decoding as applying a Gauss-Seidel- 
type algorithm to the dual of the Bethe free energy, see 
e.g. [19], [20]; in light of the observations in Sec. [5]it is 
not surprising that there is a tight relationship between 
our algorithms and the above-mentioned algorithms. 

Lemma 5: For n — > oo, the function /i'(tt£ •) is 
maximized by any value v! i j that lies in the closed 
interval between 



(S' ifi - S' iA ) and - (2£ - TjJ 



where 







— min ( 


[— Ui, a.j), 
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;-vj,bj), 


rpl 




— min ( 

^,i =1 





Proo/: See Sec. |D] □ 
Conjecture 6: Again, we can cyclically update the 
edges G £ whereby the new v! i ^ is chosen 

randomly in the above interval. Although the objective 
function for k — » +oo is concave, it is not everywhere 
differentiable. This makes a convergence proof in the 
style of Lemma |4] difficult. We think that we can again 
use the special structure of the LP at hand to show that 
the algorithm cannot get stuck at a suboptimal point. 
However, so far we do not have a proof of this fact. 
Sec. |E]discusses briefly why a convergence proof is not 
a trivial extension of Lemma □ 
Before ending this section, let us briefly remark how 
a codeword decision is obtained from a solution of 
DLPD2. Assume that x is the pseudo-codeword that 
is the solution to PLPD1 or to PLPD2. 5 Knowing the 
solution of DLPD2 we cannot directly find x, however, 
we can find out at what positions x is and at what 
positions x is 1. Namely, letting x G {0, ?, 1}" have 
the components 



if (-u^aj)| ai= o < 
= I ' if (-< ai >| ai=0 = 



(-11^)1^=1 

(-td,ai)L=i , 



1 if 



-ul-.su 



l= > (-1^,8^)^=1 



we have ±i = £i when sti equals or 1 and Xi = ? 
when £i G (0,1). In other words, with the solution 
to DLPD2 we do not get the exact x in case x is 
not a codeword. However, as a side remark, because 
supp(x) = supp(x) (where supp is the set of all 
non-zero positions) we can use x to find the stopping 
set [21] associated to x. 

5 We assume here that there is a unique optimal solution x to 
PLPD1 or to PLPD2; more general statements can be made for the 
case when there is not a unique optimal solution. 



7 Decoding Algorithm 2 

Again, we assume that u! i j and v'j i are "coupled", 
i.e. we always have u'^ i = —V^i for all (i, j) 6 £. 

While the iterative solutions of the coordinate-ascent 
methods that we presented in the previous section 
resemble the traditional min-sum algorithm decoding 
rules (and sum-product algorithm decoding rules) rel- 
atively closely, other methods for solving the linear 
program also offer attractive complexity/performance 
trade-offs. We would like to point out one such al- 
gorithm which is well suited for the linear program- 
ming problem arising from the decoding setup. Indeed, 
observing the formulation of the dual linear program 
DLPD2, sub-gradient methods 6 are readily available 
to perform the required maximization. However, in 
order to exploit the structure of the problem we focus 
our attention to incremental sub-gradient methods [22]. 
Algorithms belonging to this family of optimization 
procedures allow us to exploit the fact that the objective 
function is a sum of a number of terms and we can 
operate on each term, i.e. each constituent code in 
the FFG, individually. In order to derive a concise 
formulation of the procedure we start by considering 
a check node j £ J. For a particular choice of dual 
variables the contribution of node j to the overall 
objective function is 

Let a function g'(v'-) be defined as gj(v^) = 
— arg mining gj (—v^ , bj) where, if ambiguities exist, 
is the negative of an arbitrary combination of 
the set of ambiguous vectors . Note that for obtaining 
gj(v') we can again take advantage of the special 
structure of the LP at hand. 

Using the defining property of sub-gradient at v'j, 
namely, 

/'(*;.)< /(v;) + <d;.,v^-v<) 

it can be seen that gj(v^) is a sub-gradient. We can 
then update as follows: 

v$<-v$+ W gJ(v$), 

where \i£ S Given this, one can formulate the 

following algorithm: at iteration I update consecutively 
all check nodes j G J and then, in an analogous 
manner, update all variable nodes i G X. 

For this algorithm we cannot guarantee that the value 
of the objective function increases for each iteration 
(not even for small hi). Nevertheless, its convergence to 
the maximum can be guaranteed for a suitably chosen 
sequence {ai)i>\ [22]. 

Let us point out that gradient-type methods have also 
been used to decode codes in different contexts, see 
e.g. the work by Lucas et al. [23]. However, the setup 

6 The use of sub-gradients is necessary since the objective function 
is concave but not everywhere differentiable, cf. e.g. [17]. 




LPD (aflci max 64 Iterations) 
LPD (after max 256 Iterations) 
MSA Jattor max. 64 notations] 
MSA ;attcr max 256 Itorations) 



Fig. 3. Decoding results for a [1000, 500] LDPC code. (See Sec.[H 
for more details.) 



in [23] has some significant differences to the setup 
here: firstly, the objective function of the optimization 
problem in [23] does not depend on the observed log- 
liklihood ratio vector A, secondly, the starting point 
in [23] is chosen as a function of A. 

8 Simulation Results 

As a proof of concept we show some simulation results 
for a randomly generated (3, 6)-regular [1000, 500] 
LDPC code where four-cycles in the Tanner graph 
have been eliminated. Fig. [3] shows the decoding re- 
sults based on Decoding Algorithm 1 with update rule 
Lemma 13 compared with standard min-sum algorithm 
decoding [11]. 

9 Conclusions 

We have discussed some initial steps towards algo- 
rithms that are specially targeted for efficiently solving 
the LP that appears in LP decoding. It has been shown 
that algorithms with memory and time complexity 
similar to min-sum algorithm decoding can be achieved. 
There are many avenues to pursue this topic further, 
e.g. by improving the update schedule, by studying how 
to design codes that allow efficient hardware implemen- 
tation of the proposed algorithms, or by investigating 
other algorithms that use the structure of the LP that 
appears in LP decoding. We hope that this paper raises 
the interest in exploring these research directions. 

Finally, without going into the details, let us remark 
that the algorithms here can also be used to solve certain 
linear programs whose value can be used to obtain 
lower bounds on the minimal AWGNC pseudo-weight 
of parity-check matrices, cf. [8, Claim 3]. (Actually, 
one does not really need to solve the linear program 
in [8, Claim 3] in order to obtain a lower bound on the 
minimum AWGNC pseudo-weight, any dual feasible 
point is good enough for that purpose.) 



Appendix 
A Proof of Lemma 12 

If (a, b) is consistent then 

< b (u') = $> u ^> + E<+ u ^> 



+ E u 'iJ b M 
(i,j)e£ 



iex 



iex 



On the other hand, if (a, b) is not consistent and 
G £ is such that ai j ^ bjj then g ab (u') is 
non-constant in j . 

B Proof of Lemma |3] 

This result is obtained by taking the derivative of 
h'(u' i: j), setting it equal to zero, and solving for ttj .. 
Let us go through this procedure step by step. Using 
the fact that u\ j = —v'j i7 the function h'iu^ can be 
written as 

h'^ ,) 4 min W(-u^ ai ) + min ^(-v'.b,-) 

\b 3 eB 3 

= -A log I c +su, J a,j+u(Oi,i,) j 




log e e ~ KU ' 



j 6 j . i + K<v J -,b J -) 



K V 

- log (e KT ->'.° + e'^-.ie"^ 1 ! 



Setting the derivative of rV(itj ■) with respect to u[ j 
equal to zero we obtain 

0= hJ 



1 

1 „ Ke -^, je KT ^ 



Multiplying out we get 



This yields 

^• = J(+Ko-5[ 1 )-K,o- 2 i,i)). 
which is the promised result. 

C Proof of Lemma |4] 

We can use results from [17, Sec. 2.7], where the fol- 
lowing setup is considered. 7 Consider the optimization 
problem 



maximize /(x) 
subject to x G X, 

where X = X\ x • ■ ■ x X m . The set X{ is assumed to be 

a closed convex subset of K ni and n = ni H h n m . 

The vector x is partitioned asx= (xi, . . . , x m ) where 
each Xi G R ni . So the constraint x G X is equivalent 
to x, G Xi, i G {1, . . . , m}. 

The following algorithm, known as block coordinate- 
ascent or non-linear Gauss-Seidel method, generates 
the next iterate x fc+1 = (xj +1 , . . . , x^ 1 " 1 ), given the 
current iterate x fc = (xj 1 , . . . , x„) according to the 
iteration 

k+l A 

x, = are' max 

/( X i + i---i x ^lilti x i+l!---i x m)- (4) 

Proposition 7 ([17, Prop. 2.7.1]): Suppose that / is 
continuously differentiable over the set X. Furthermore, 
suppose that for each i and x G X, the maximum below 

max /(xi, . . .jXj-i^Xi+i, . . . ,x m ) 

is uniquely attained. Let {x fe } be the sequence gener- 
ated by the block coordinate-ascent method ©. Then 
every limit point of {x fc } is a stationary point. □ 
We turn our attention now to our optimization 
problem. The fundamental polytope (which is the set 
P| E j conv(Cj)), has dimension n if and only if the 
parity-check matrix has no rows of Hamming weight 1 
and 2. This type of non-degeneracy of PLPD2 implies 
the strict concavity of the function that we try to opti- 
mize in SDLPD2. Based on A one can then without loss 
of generality define suitable closed intervals for each 
variable so that one can apply the above proposition to 
our algorithm. 

D Proof of Lemma |5] 

Define the functions 

s (u; „) = min (— u',a ? ) and 
,•) = min (-v'-,b,-) 

7 We have adapted the text for maximizations instead of minimiza- 
tions. 




Fig. 4. Illustration of the functions s/Ju^ j), t'(«£ j), and h'lu^ •) 
appearing the the proof of Lemma l5l Left plots: exemplary case 
for S[ - S't j > T/j - T( . Right plots: exemplary case for 

such that h'iu'ij) = s'(u' itj ) + t'(u' itj ). Then 

s '{ u 'i,j) - min (-u-.a,) 

= min -u' t .aij - (u-,ai) 
= min (-5- , -u' itj -S' iA ), 

t'(u'i,j) - b ™g.^ _v i' b J') 

= min + u 'i,jhi ~ (Vj-,bj) 

™3 ^ 3 

= min(-T/ 0! +u' id -T[ A ). 

As can be seen from Fig.0 the functions s'(u' i: j) and 
t'(u' i: j) are both piece-wise linear functions. Whereas 
the function s'(u- ■) is flat up to u[ a = S' i — S' i;1 
and then has slope —1, the function r(uj ■) increases 
with slope +1 up to u' t j =T! 1 — T/ and is then flat. 
From Fig. |3 is can also be seen that, independently if 
S' i0 — SI x is larger or smaller than T' ix — T/ , the 
function h'iu^A always consists of three parts: first it 
increases with slope +1, then it is flat, and finally it 
decreases with slope — 1. From this observations, the 
lemma statement follows. 

E Comment to Conjecture |6| 

This section briefly discusses a concave function where 
a coordinate-ascent approach does not find the global 
maximum. Let < a < 1 and let 

f(xi,x 2 ) = min (— X\ + x 2 + a(xi + x 2 ), 
+ x\ — x 2 + a{x\ + x 2 )). 

The level curves of f{x\,x 2 ) are shown in Fig. [5] By 
choosing (xi,x 2 ) = (a, a) and letting a go to oo we 
see that this function is unbounded. 

Consider now the optimization problem 
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Fig. 5. Level curves of f(x\,X2) in Sec. IEI 



maximize 


f(x 1 ,x 2 ) 


subject to 


(xi,X 2 ) 6 X, 



where X is some suitably chosen closed convex subset 
of M 2 . Assume that a coordinate-ascent-type method 
has e.g. found the point [x\,x 2 ) — (0,0) with 
/(0,0) = 0. (Of course, we assume that (0,0) £ X.) 
Unfortunately, at this point the coordinate-ascent-type 
method cannot make any progress because /(xj.,0) 
min ( — (1 — a)x\, (1 + a)xx) < for all x\ ^ and 
f(Q,X2) = min ((1 + a)x2, — (1 — 0)2:2) < for all 
x 2 ^ 0. 

However, defining 

f K \x u x 2 ) = min (K) (-xi + x 2 + a(x 1 + x 2 ), 
+ xi - x 2 + a(xi + x 2 )) , 

where n 6 K++ is arbitrary, a coordinate-ascent 
method can successfully be used for the "softened" 
optimization problem 



maximize 


& 




subject to 


(xi 


x 2 ) e x. 
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